What is a data product?
We’ll begin this book by defining the topic of this class, data products. A
data product is the production output of a data analysis. For example, a data
analysis might build a clever machine learning algorithm. A data product
embeds that algorithm in a web site so that users can input values and get
predictions. Interactive analysis web sites, graphics, apps, R packages,
presentations and reports are all data products. In this book we focus only
on a few of these components. Mostly for space reasons, but also because
our Coursera specialization covers others (like report writing).
Before beginning this book, you should been functional in R. This language
will serve as the launching point for all of our data products. Fortunately, if
you don’t know R, Roger Peng has a great coursera class and LeanPub book
on the subject; take and read those first. The class runs every month and
both can be obtained for free.
Why R? Well for starters, it’s what I know. But, also it’s a very prevalent
data analysis language. Thus, it’s convenient to build the data product in the
same language as the analysis is done in. In addition, the list of tools that
one needs to learn beyond R to develop data products is massive and
include: html5, javascript, D3, REST, python, AWS, and so on. In some
sense, the tools we present are best thought of as prototyping tools before
building a larger production endeavor. However, for many applications,
they can stand alone. Shiny, in particular, is undergoing rapid adoption,
development and growth.
The goal of this book
This book (and the corresponding class) has one simple goal: get you
started on making data products by introducing you to some very neat tools
in R. We only scratch the surface on most of these fantastic platforms, and
sadly omit some important ones. It’s best to pursue this book with a simple
data project in mind. So, before begining, think of a data oriented web app